Seed quantity affects the fungal community composition detected using metabarcoding

Pest introductions via trade in tree seed may result from a lack of adequate survey and validation protocols. Developing better diagnostic protocols to identify potentially harmful pests and pathogens in forest tree seed is of critical importance. High-throughput sequencing-based barcoding and metabarcoding provide effective tools for screening potentially harmful organisms in various plant materials, including seeds. However, the sample size needed to detect the total microorganism diversity of a community is a major challenge in microbiome studies. In this work, we examined how increasing sample size (ranging between 100 and 1000 seeds) influences diversity of fungal communities detected by high throughput sequencing in Pinus sylvestris seeds. Our results showed that as sample size increased, fungal alpha diversity also increased. Beta-diversity estimators detected significant differences between the mycobiota from different samples. However, taxonomic and functional diversity were not correlated with sample size. In addition, we found that increasing the number of PCR replicates resulted in a higher abundance of plant pathogens. We concluded that for the purpose of screening for potentially harmful pathogens using HTS, greater efforts should be made to increase the sample size and replicates when testing tree seed.

Increased connectivity and globalization have greatly accelerated the frequency and magnitude of biological invasions around the globe 1 . Huge numbers of plants and plant products, many imported from developing countries, circulate within the market, potentially enhancing the spread of alien pests and pathogens 2,3 . For example, plant trade accounts for 57% of pest introductions into Europe alone 3 .
Long-term impacts of invasive alien forest pests and pathogens, including costs for management and derived economic loss, grow inexorably as the number of introductions increases. Phytosanitary requirements for inter- The new rules were implemented with several acts, including lists of regulated pests, high-risk plants and plant products and priority pests. Although this new regulation introduces more effective measures than previously in place for the protection of the European territory, discrepancies among member states in the application of the regulations could result in a weakening of the EU biosecurity status 5 . Another issue not considered in the new legislation is the possibility that alien pests and pathogens enter states on either non-or unknown hosts and asymptomatic hosts 6,7 .
As with the previous EU Plant Health Directive (2000/29/EC), EU Regulation 2016/2031 does not provide phytosanitary rules for trade in all forest seeds. Based on the pest risk analysis issued by the European Food and Environment Safety Agency (EFSA), contaminated seeds can be pathways for the potential introduction of Fungal α-diversity between different amounts of seed samples. All rarefaction curves approached the saturation plateau, which indicated that the sequencing effort detected the full diversity of the fungal communities (Fig. 1). The correlation between sequencing depth and number of OTUs was not significant (Pearson's correlation coefficient r = 0.003; P > 0.05). However, rarefaction curves demonstrated that 15 replicates were characterized by a different abundance of OTUs, with the highest values for sample F3.
The number of OTUs per sample (A-F) ranged from 101 to 178, with a total of 625 unique OTUs across the dataset. There was a linear correlation between sample size and number of OTUs (Linear regression, R2 = 0.90; P < 0.05). Approximately 14% of the OTUs (87) were shared among all samples (Fig. 2). With increasing sample size, higher numbers of unique OTUs were observed (Pearson's correlation coefficient r = 0.98, P = 0.002).   (Fig. 2). The α-diversity species richness, based on Chao-1, Shannon and Simpson indices, tended to increase as sample size increased (Fig. 3). The pairwise comparison of samples based on the Chao-1 index supported the distinction of samples in two group: Group 1 (samples A and B; extracted from equal to or less than 200 seeds) and Group 2 (samples D, E and F; samples obtained from greater than 600 seeds) (Student t-test, P < 0.05), and were then confirmed by the ANOVA test (post-hoc Tukey test; P < 0.05). Conversely, Shannon and Simpson indices did not differ significantly between samples (Student T-test; P > 0.05) (Fig. 3).
Fungal diversity among replicates. The average of core OTUs (number of OTUs in all replicates) among three replicates was 20 ± 3%, with no significant differences among samples (Chi-square test, P > 0.05). About 66% (A) to 57% (F) of unique OTUs were only captured in a single sub-sample replicate, and no correlation with sample size was found (Pearson's correlation coefficient, P > 0.05). OTUs richness (total number of OTUs) in a sample increased with additional replicates, regardless of the number of seeds sampled. A single replicate captured approximately 50% of the richness. Adding a second replicate and a third replicate led to the apparent richness increasing by approximately 30% and 20%, respectively (Fig. 4). The OTUs present in a single replicate were mainly low-abundant taxa (< 50 reads). The threshold of proportional abundance required for an OTU to be detected in all 3 PCR replicates was 10% of the total reads of that OTU. The coefficient of variation of α-diversity indices (% CV) among replicates was not associated with sample size (Linear regression; P < 0.05). Fungal community structure at OTU level. Differences between communities were evaluated using as explanatory variable the OTUs abundance by the two non-parametric tests of analysis of similarities (ANOSIM) and PERMANOVA. Both analysis confirmed that communities were separated with high dissimilarity between groups (P = 0.0001; 1000 permutations) and separation was large (ANOSIM R = 0.69 and PERMANOVA F-statistics = 3.8).
The potential effects of technical variation on the differentiation of microbial communities at different sample sizes were determined using Non-Metric Multidimensional Scaling analysis (NMDS). Replicates mainly grouped according to the number of seeds (Fig. 5). Samples A and B partially overlapped, whereas samples D, E and F were well separated.
Fungal community structure at the taxonomic level. OTUs retrieved from seed samples were classified in four phyla: Ascomycota, Basidiomycota, Chytridiomycota and Mucoromycota. Approximately 9% of the OTUs remained unclassified (range: 1-28% among samples). All samples appeared dominated by Ascomycota, which accounted for an average of 90% (range: 88-97% among samples) of the total fungal diversity in the samples, followed by Basidiomycota (range: 0-1% among samples). The relative abundances (read abundance) of Chytridiomycota and Mucoromycota were lower than 0.00001% in all samples. The remaining sequences (9%), consistently detected in all samples, were mostly affiliated to uncultured fungi, matching members of "Fungi spp. " At the order level, approximately 34% of OTUs were unclassified. Diaporthales and Pleosporales were the most abundant orders with about 14% and 20% of the total reads and they were equally present in all samples (Chi-square test; P > 0.05). www.nature.com/scientificreports/ At the genus level, a total of 148 genera were found. The genera Sydowia dominated the fungal community (8% of the total OTUs), followed by Alternaria (2% of the total OTUs). A total of 32 genera were common to all samples, while 60 genera were detected only on one of the five samples analyzed.
Of the 625 OTUs, 155 were identified at the species level with Sydowia polyspora being the most abundant (55% of total reads) followed by Dothiorella sarmentorum (15% of total reads), Epiccoccum nigrum (6% of total reads) Alternaria alternata (4% of total reads) and Alternaria brassicae (3% of total reads). Species composition differed between samples, although 14 taxa (Alternaria alternata, Cryptococcus pinus, C. tephrensis, Dothiorella sarmentorum, Epicoccum nigrum, Exobasidium maculosum, Fusarium avenaceum, Nectria balsamea, Pleuroceras pseudoplatani, Rosellinia aquila, Sarocladium strictum, Stagonospora pseudopaludosa, Sydowia polyspora, Vishniacozyma carnescens) were common in all samples, with a minimum occupancy in the dataset of approximately 0.0005% of total reads. A minimum portion of the total reads (0.01%) was identified as rare species, i.e. present www.nature.com/scientificreports/ only in one sample. These reads represent a total of 48% (n = 155) of the taxa identified to species level, and equally distributed among samples (Chi-square test, P > 0.05). Most of the OTUs identified at the species level were considered cosmopolitan species. However, one OTU (matching Pestalotiopsis citrina) has been never detected in Europe and was equally distributed in different samples.
Fungal community structure at the functional level. The fungal community colonizing P. sylvestris seeds was assessed in terms of fungal guilds and trophic modes. The community was included in six trophic categories with confidence levels including "highly probable" and "probable", with the category "other" including "possible" and "unclassified" guilds: pathotroph,athotroph-saprotroph, pathotroph-symbiotroph, saprotroph, saprotroph-symbiotroph, symbiotroph. The relative read abundance of saprotroph fungi was higher than that of other groups (mean of the 15 replicates was 56%; range 36-87%). Pathotroph fungi, pathotroph-symbiotroph and pathotroph-saprotroph were less represented (12.4%, 5.2%, 2.6% of the total reads, respectively). There were no significant correlations between fungal functional groups (trophic modes and guilds) and seed samples. Table 1 shows the relative richness of the most abundant functional guilds (> 1% in at least one sample) among samples. The functional guild "Plant Pathogen" included 35 genera out of the 148 recorded genera (24%). However, approximately 27 genera, known to include plant pathogenic species, were classified in other guilds. The genus Alternaria, for example, was included in the Animal Pathogen-Endophyte-Plant Pathogen-Wood Saprotroph guild. The abundance (number of OTUs) of plant pathogens in a sample increased with additional replicates, for all samples (Fig. 6). However, the richness in terms of reads assigned to plant pathogens was not significantly affected by sample size (Pearson's correlation coefficient r = 0.78; P > 0.05).

Discussion
The high-throughput sequencing methodology used in the current study combined with FUNGuild analyses allowed detection of the DNA of a range of fungal taxa and demonstrated that fungal community diversity depends on the sampling intensity of seeds. This means that current protocols used in phytosanitary screening with a standard sample size (e.g. the EPPO requirement on number of seeds) may not account totally for all possible fungal taxa.  www.nature.com/scientificreports/ To our knowledge, this is the first work showing that sample size can affect the measurement and interpretation of biodiversity metrics in tree seeds. Safe trade of tree seeds requires adequate surveys and validated protocols in which each step should be well-defined, including the size of samples to be tested. Our results support the findings of previous eDNA studies that the choice of sample size should be considered in detection and monitoring studies. Results showed that samples have significant variability in their fungal microbiotas (betadiversity) but samples A and B had less variability than samples D through F (NMDS, Chao index). Moreover, larger samples (D-F) accumulated more rare OTUs. Penton, et al. 15 argued that optimal diversity was detected in large soil samples (10 g soil), capturing rare taxa and decreasing replicate variation, compared with 0.25 and 1 g samples. Similarly, Nascimento, et al. 19 , considering the effects of sample volumes in HTS of aquatic sediments, showed that large samples (> 1% of the sampled area) were required to obtain robust diversity measurement of both metazoan and non-metazoan eukaryotes.
It remains a critical question whether variation among replicates can be fully overcome by increasing sample size in community studies. Several studies focused on the effect of PCR replicates on ecological measures, also with inconsistent results. While many studies using different numbers of PCR replicates reported differences between taxonomic profiles 20-22 , Smith and Peay 23 concluded that increasing the number of PCR replicates does not affect ecological measures, especially the alpha diversity. However, according to Kauserud et al. 24 , the consistency between replicates might depend on the diversity and complexity of the fungal community considered. The impact of stochasticity during PCR might be higher where a high number of different fungi (and PCR templates) are present.
In our study, PCR replicates increased the possibility to detect new OTUs. In accordance with Shirazi et al. 24 , our study also showed that rare taxa were unique to one or a few replicates of a sample. Moreover, a high value of abundance was not the limit factor for an OTU to be present in all replicates. Thought collectively the unique OTUs account for a small proportion of the total microbiome, we cannot exclude they could play an important role in plant health.
It is worth noticing, that increasing the number of replicates in sampling comes at a cost in terms of finance and time. Nevertheless, replicate numbers should be defined based on sample characteristics. Ficetola, et al. 25 examining ancient DNA estimated that at least eight PCR replicates should be analysed for samples suspected as having low detection probability. Shirazi et al. 24 found that even 24 PCR replicates may be insufficient to catalogue biodiversity fully in soil samples and then they recommended a minimum of two PCR replicates to differentiate between sites.
In terms of the taxonomy and functional composition, fungal communities of P. sylvestris seeds were similar to those described previously 8 with the most abundant fungal phylum being Ascomycota. Similarly, a high presence of Diaporthales and Pleosporales was already reported in pine seeds 8 . At the species level, Sydowia polyspora Table 1. Relative richness of the most abundant functional guilds (> 1% in at least one sample) among samples reported as mean (SD). Different letters after the SD indicate significance at p < 0.05 (Tukey post hoc test).  www.nature.com/scientificreports/ dominated fungal communities in seed (relative abundance between 38.5 and 69.5%), which is in agreement with data from previous work on pine seeds 8 .

Functional guild
In this study, we confirm that HTS can be used to detect alien species and plant pathogens. In addition, increasing the sample size will result in a higher number of rare OTUs, and one cannot exclude the possibility that they may be potentially harmful plant pathogens. Thus, while there may be limitations to surveys in terms of resource, in the case of phytosanitary inspections, greater efforts should be made to increase the sample size and replicates when testing tree seed.
We recognize that the practical application of the HTS method, considering the number of samples and replicates that can be processed, will never capture the full diversity in samples, and screening only the DNA (as opposed to RNA) cannot discern the viability of fungal taxa detected. In any case, this method is useful as a rapid and first screening to detect potentially harmful pathogens from which more targeted diagnostics can then be performed (e.g. selective media for culturing, qPCR assays or RNAseq) to confirm identity and viability of pathogens. Albeit we should be aware that those taxa that are 'unknown' or yet to be described will inevitably escape our attention.

Methods
Pinus sylvestris seeds, 0.5 kg, collected in 2015 were obtained from a seed orchard in Sweden (Lilla Istad, Öland Island). The pre-determined thousand seed weight (TSW) of the seed lot was 6.2 g, with 100% purity, and a germination rate and viability of 98.50 and 99.75%, respectively. Seeds were not sterilized prior to DNA analyses in order to retain the microbial community on the seed coat which may in itself, serve as an introduction pathway for pathogens. DNA extraction, PCR and Illumina amplicon sequencing. Six samples, containing 100, 200, 400, 600, 800 and 1000 seeds (samples A-F), were randomly sorted from the seed lot by weighing the corresponding amount calculated according to the TSW (0.62 to 6.2 g). To allow for small scale heterogeneity within the seed lot, 0.62 g (100 seeds) was taken from the lot at each time and added to the relevant sample until the required final weight was attained (i.e.; 1 × 0.62 g seeds for sample A, 6 × 0.62 g for sample D, 10 × 0.62 g for sample F). Prior to sampling, the 0.5 kg seed lot was transferred to a plastic box, which was surface sterilized using 2% NaOCl (for 10 min) and 70% EtOH (for 5 min), and thoroughly mixed between sub-samplings. Seeds were lyophilized for 48 h in 50 ml Falcon tubes, before homogenizing samples to fine powder in a 25 ml metal grinding container containing a 6 mm diameter stainless steel ball using a mixer mill (Retsch GmbH and Co., KG, Haan, Germany). The metal container and balls were cleaned thoroughly between samples. A single DNA extraction was carried out for each of the six samples, on 30 mg homogenized tissue, using the E.Z.N.A. SP Plant DNA Kit (Omega Bio-tek, Doraville, Georgia) following the manufacturer's protocol for dry samples. DNA was quantified using a NanoDrop® (ND-2000 UV/vis Spectrophotometer, Thermo Fisher Scientific Inc., Wilmington, Delaware, USA).
The ITS2 region of the rDNA was amplified by PCR using the primers fITS7 (GTG ART CAT CGA ATC TTT G) 26  Bioinformatics and analysis of sequence data. Sequences were processed using QIIME version 1.7.0 28 . Poor quality reads (< Q20, minimum length = 150 bp, homopolymer = 6 bp) were removed. All sequence files were combined into a single fasta file. Operational taxonomic units [OTUs, sensu 29 ] were clustered with 97% similarity cut-off using UPARSE 30 and chimeric sequences identified and removed in de-novo mode using UCHIME with the default parameters (λ = 2, Minimum score (h) = 0.28; (β) = 8.0; n = 1.4, minimum number of diffs in a segment = 3; minimum divergence = 0.8) 31 . To reduce the effects of sequence number variation, samples were rarefied to the minimum sequencing depth (1 597 479 reads, sample B3) through a subset of randomly selected reads prior to downstream analysis 8 . OTUs were assigned to taxonomic affinities using the Basic Local Alignment Search Tool (BLAST) at the NCBI website (based on > 97% sequence similarity). OTUs were only assigned to species level if the query sequence matched database sequences from fungal isolates (including at least one vouchered specimen) where percentage sequence identity was > 97% and then performed a manual check of the E-values (E −100 ), to ensure no contradictions among different species within the same genus. At lower percentage sequence identity, the query read was identified to genus (identity 95-96%) family (identity 90-94%) order (identity 80-88%) and class level (identity < 80%) 10 . Non fungal-sequences were removed. www.nature.com/scientificreports/ were compared between groups using the ANOVA test with Tukey post hoc analysis (P > 0.05). Linearity of the association of the α-diversity and sample size was determined by linear regression. Correlation between the number of unique OTUs and sample size was assessed by the Pearson's correlation analysis. Venn diagrams were constructed to visualize the shared fungal communities among all samples. To explore relationships between the fungal community structures and numbers of seeds, PERMANOVA 32 and ANOSIM 33 analyses were implemented using a Bray-Curtis distance matrix and the OTUs abundance as explanatory variable. The ANOSIM test statistic (R) ranges from 0 to 1, where the value 0 indicates random grouping between samples and the value 1 indicates 100% dissimilarity between samples. Fungal community composition among samples and replicates was visualized using Non-Metric Dimensional Scaling (NMDS) on a Bray-Curtis distance matrix.

Comparisons of microbial communities and statistical analysis.
Several approaches were used to assess the reproducibility of replicates. To estimate OTUs overlap among three replicates the following formula was applied: where SOTUS 1 = the number of sequences within the shared OTUs of replicate 1 shared with other OTUs; SOTUS 2 = the number of sequences within the shared OTUs of replicate 2 shared with other OTUs; SOTUS 3 = the number of sequences within the shared OTUs of replicate 3 shared with other OTUs; TOTUs = total number of sequences from replicates 1, 2, and 3.
Dispersion of the variation among replicates was estimated using the coefficient of variation, or ratio of standard deviation to the mean, with the formula CV = s/x, where s = standard deviation and x̄ = mean. The higher the value of CV the more dispersion is present. When comparing the OTUs abundance across samples, a Chi-square test was usually used. The identity of OTUs was assessed considering ≥ 97% sequence identify, and ≥ 90% coverage as positive hits.
Each OTU was classified into an ecological guild using FUNGuild 34 to assess how much variation in community composition was represented by sample size. Those OTUs that were identified with a confidence level designated as "possible" 33 were included in the category "other" together with the OTUs which remained unclassified. Trophic modes and guilds were described for the OTUs that were identified with a confidence level designated as "highly probable" or "probable" 33 . In addition, OTUs which were placed in more than one guild were included in the " > 1 guild" category 35 . Finally, relative abundances of microbial taxa and guilds in each sample were calculated and compared using the Chi-square test. The significance level was set at 5%.
The geographical distribution of OTUs identified to a species level was checked using the USDA Fungal Database 36 .
Data were analysed in R version 3.5.1 37  www.nature.com/scientificreports/ Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.